智能论文笔记

R2FD2: Fast and Robust Matching of Multimodal Remote Sensing Image via Repeatable Feature Detector and Rotation-invariant Feature Descriptor

Bai Zhu , Chao Yang , Jinkun Dai , Jianwei Fan , Yuanxin Ye

分类：计算机视觉

2022-12-05

Automatically identifying feature correspondences between multimodal images is facing enormous challenges because of the significant differences both in radiation and geometry. To address these problems, we propose a novel feature matching method, named R2FD2, that is robust to radiation and rotation differences.Our R2FD2 is conducted in two critical contributions, consisting of a repeatable feature detector and a rotation-invariant feature descriptor. In the first stage, a repeatable feature detector called the Multi-channel Auto-correlation of the Log-Gabor is presented for feature detection, which combines the multi-channel auto-correlation strategy with the Log-Gabor wavelets to detect interest points with high repeatability and uniform distribution. In the second stage, a rotation-invariant feature descriptor is constructed, named the Rotation-invariant Maximum index map of the Log-Gabor, which consists of two components: fast assignment of dominant orientation and construction of feature representation. In the process of fast assignment of dominant orientation, a Rotation-invariant Maximum Index Map is built to address rotation deformations. Then, the proposed RMLG incorporates the rotation-invariant RMIM with the spatial configuration of DAISY to depict a more discriminative feature representation, which improves RMLGs resistance to radiation and rotation variances.

translated by 谷歌翻译

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Andrey Ignatov , Radu Timofte , Maurizio Denna , Abdel Younes , Ganzorig Gankhuyag , Jingang Huh , Myeong Kyun Kim , Kihwan Yoon , Hyeon-Cheol Moon , Seungho Lee

分类：计算机视觉

2022-11-07

Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.

translated by 谷歌翻译

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Xiaoran Fan , Chao Pang , Tian Yuan , He Bai , Renjie Zheng , Pengfei Zhu , Shuohuan Wang , Junkun Chen , Zeyu Chen , Liang Huang

分类：自然语言处理

2022-11-07

Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing. We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes given a speech example and its transcription. By learning to reconstruct the masked parts of the input in different languages, our model shows great improvements over speaker-embedding-based multi-speaker TTS methods. Moreover, our framework is end-to-end for both the training and the inference without any finetuning effort. In cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing tasks, our experiments show that our model outperforms speaker-embedding-based multi-speaker TTS methods. The code and model are publicly available at PaddleSpeech.

translated by 谷歌翻译

Attributed Network Embedding Model for Exposing COVID-19 Spread Trajectory Archetypes

Junwei Ma , Bo Li , Qingchun Li , Chao Fan , Ali Mostafavi

分类：机器学习

2022-09-20

COVID-19的传播表明，在不同的城市和社区之间，传播风险模式不是同质的，各种异质特征会影响传播轨迹。因此，对于预测性大流行监测，至关重要的是，在城市和社区中探索潜在的异质特征，以区分其特定的大流行扩散轨迹。为此，这项研究创建了一个网络嵌入模型，捕获跨县的访问网络以及异质特征，以根据其大流行传播轨迹来发现美国县的集群。我们从3月3日至2020年6月29日（初始波浪）收集了2,787个县的位置智能特征。其次，我们构建了一个人类访问网络，该网络将县特征作为节点属性和县之间的访问作为网络边缘。我们的归因网络嵌入方法整合了跨县访问网络的类型学特征以及异质性特征。我们对属性网络嵌入进行了聚类分析，以揭示与四个县群相对应的差异风险轨迹的四种原型。随后，我们确定了四个功能是原型之间独特的传输风险模式的重要特征。归因的网络嵌入方法和发现识别并解释了整个县的非殖民性大流行风险轨迹进行预测性大流行监测。这项研究还为大流行分析的基于数据驱动和深度学习的方法有助于补充大流行病政策分析的标准流行病学模型。

translated by 谷歌翻译

Learning Gait Representation from Massive Unlabelled Walking Videos: A Benchmark

Chao Fan , Saihui Hou , Jilong Wang , Yongzhen Huang , Shiqi Yu

分类：计算机视觉

2022-06-28

步态描绘了个人独特而区别的步行模式，并已成为人类识别最有希望的生物识别特征之一。作为一项精细的识别任务，步态识别很容易受到许多因素的影响，并且通常需要大量完全注释的数据，这些数据是昂贵且无法满足的。本文提出了一个大规模的自我监督基准，以通过对比度学习进行步态识别，旨在通过提供信息丰富的步行先验和各种现实世界中的多样化的变化，从大型的无标记的步行视频中学习一般步态代表。具体而言，我们收集了一个由1.02m步行序列组成的大规模的无标记的步态数据集gaitu-1m，并提出了一个概念上简单而经验上强大的基线模型步态。在实验上，我们在四个广泛使用的步态基准（Casia-B，Ou-Mvlp，Grew and Grew and Gait3d）上评估了预训练的模型，或者在不转移学习的情况下。无监督的结果与基于早期模型和基于GEI的早期方法相当甚至更好。在转移学习后，我们的方法在大多数情况下都超过现有方法。从理论上讲，我们讨论了步态特异性对比框架的关键问题，并提供了一些进一步研究的见解。据我们所知，Gaitlu-1M是第一个大规模未标记的步态数据集，而GaitSSB是第一种在上述基准测试基准上取得显着无监督结果的方法。 GaitSSB的源代码将集成到OpenGait中，可在https://github.com/shiqiyu/opengait上获得。

translated by 谷歌翻译

GaitEdge: Beyond Plain End-to-end Gait Recognition for Better Practicality

Junhao Liang , Chao Fan , Saihui Hou , Chuanfu Shen , Yongzhen Huang , Shiqi Yu

分类：计算机视觉

2022-03-08

步态是长距离识别个体的最有前途的生物识别技术之一。尽管大多数以前的方法都集中在识别轮廓上，但直接从RGB图像中提取步态特征的几种端到端方法表现更好。但是，我们证明了这些端到端方法可能不可避免地会遭受步态液化的噪音，即低级纹理和丰富多彩的信息。在实验上，我们设计了跨域评估以支持这种观点。在这项工作中，我们提出了一个名为Gaitedge的新颖端到端框架，该框架可以有效地阻止步态 - 近距离信息并发布端到端训练潜力。具体而言，Gaitede合成了行人分割网络的输出，然后将其馈送到随后的识别网络中，在该网络中，合成轮廓由身体的可训练边缘和固定内部室内装饰组成，以限制识别网络接收的信息。此外，对齐轮廓的步态嵌入了盖地，而不会失去不同的性能。关于CASIA-B和我们新建的TTG-200的实验结果表明，Gaitedge明显优于先前的方法，并提供了更实用的端到端范式。所有源代码均可在https://github.com/shiqiyu/opengait上获得。

translated by 谷歌翻译

Learned Image Compression with Separate Hyperprior Decoders

Zhao Zan , Chao Liu , Heming Sun , Xiaoyang Zeng , Yibo Fan

分类：计算机视觉

2021-10-31

学习的图像压缩技术近年来取得了相当大的发展。在本文中，我们发现性能瓶颈位于使用单个高度解码器，在这种情况下，三元高斯模型折叠到二进制文件。为了解决这个问题，我们建议使用三个高度解码器来分离混合参数的解码过程，以分散的高斯混合似然性，实现更准确的参数估计。实验结果表明，与最先进的方法相比，MS-SSSIM优化的所提出的方法实现了3.36％的BD速率。所提出的方法对编码时间和拖鞋的贡献可以忽略不计。

translated by 谷歌翻译

FAR Planner: Fast, Attemptable Route Planner using Dynamic Visibility Update

Fan Yang , Chao Cao , Hongbiao Zhu , Jean Oh , Ji Zhang

分类：机器人

2021-10-18

未知环境中的路径规划问题仍然是一个具有挑战性的问题 - 由于在导航期间逐渐观察到环境，因此，基础规划师必须更新环境表示，并及时且不断地进行重新启动，以说明新的观察值。在本文中，我们提出了一个基于图形的计划框架，能够处理已知和未知环境中的导航任务。计划者采用环境的多边形表示，并通过在障碍物周围提取边缘点以形成封闭的多边形来构建表示形式。因此，该方法使用两层数据结构动态更新了全局可见性图，并扩展了可见性边缘以及导航和删除被新观察到的障碍物阻塞的边缘。当在未知环境中导航时，该方法可以通过即时拾取环境布局，更新可见性图，并快速重新规划与新观察到的环境相对应，从而尝试发现目标的方法。我们在模拟和现实世界中评估了该方法。该方法显示了尝试和导航未知环境的能力，从基于搜索的方法中减少了多达12-47％的旅行时间：A*，d* lite，并且比基于采样的方法相比： rrt*，bit*和Spars。

translated by 谷歌翻译

POLAR: A Polynomial Arithmetic Framework for Verifying Neural-Network Controlled Systems

Chao Huang , Jiameng Fan , Xin Chen , Wenchao Li , Qi Zhu

分类：机器学习

2021-06-25

我们提出了Polar，A \ textbf {pol} ynomial \ textbf {ar} iThmetic框架，该框架利用多项式过度应用与间隔剩余的剩余，以进行界限时间到达的到达时间到达，对神经网络控制系统（NNCSS）的界限到达。与使用标准泰勒模型的现有算术方法相比，我们的框架使用一种新颖的方法来迭代过度陈化神经元的输出范围逐层范围均与伯恩斯坦多项式插值的组合，用于连续激活功能和其他操作的泰勒模型。这种方法可以克服标准泰勒模型算术中的主要缺点，即无法处理泰勒多项式无法很好地近似的功能，并显着提高了NNCS的可及状态计算的准确性和效率。为了进一步拧紧过度应用，我们的方法在估计神经网络的输出范围时，将泰勒模型保持在线性映射下的象征性。我们表明，极性可以与现有的泰勒模型流管构造技术无缝集成，并证明极性在一组基准测试套件上明显优于当前最新技术。

translated by 谷歌翻译

A Cooperative-Competitive Multi-Agent Framework for Auto-bidding in Online Advertising

Chao Wen , Miao Xu , Zhilin Zhang , Zhenzhe Zheng , Yuhui Wang , Xiangyu Liu , Yu Rong , Dong Xie , Xiaoyang Tan , Chuan Yu

分类：人工智能

2021-06-11

在线广告中，自动竞标已成为广告商通过简单地表达高级活动目标和约束来优化其首选广告性能指标的重要工具。以前的作品从单个代理的视图中设计了自动竞争工具，而不会在代理之间建模相互影响。在本文中，我们从分布式多功能代理人的角度来看，请考虑这个问题，并提出一个常规$ \强调{m} $ ulti - $ \强调{a} $ gent加强学习框架，以便为$ clown {a} $ uto - $ \ Underline {b} $ IDDIND，即MAAB，了解自动竞标策略。首先，我们调查自动招标代理商之间的竞争与合作关系，并提出了一个温度定期的信用分配，以建立混合合作竞争范式。通过在代理商中仔细开展竞争和合作权衡，我们可以达到均衡状态，不仅担保个人广告商的实用程序，而且保证了系统性能（即社会福利）。其次，为避免竞争低价潜在勾结行为的合作，我们进一步提交了律师代理，为每位专家设定个性化招标酒吧，然后减轻由于合作而导致的收入退化。第三，要在大型广告系统中部署MAAB，我们提出了一种平均现场方法。通过将具有与平均自动竞标代理商相同的广告商进行分组，大规模广告商之间的互动大大简化，使得培训MAAB有效地培训。在离线工业数据集和阿里巴巴广告平台上进行了广泛的实验表明，我们的方法在社会福利和收入方面优于几种基线方法。

translated by 谷歌翻译